Bulk Messaging System

Documentation

Back to Home
Home Projects Bulk Messaging System Python Backend Services Contact Extraction Utilities CSV File Extraction

CSV File Extraction

Table of Contents#

  1. Introduction

  2. Project Structure

  3. Core Components

  4. Architecture Overview

  5. Detailed Component Analysis

  6. Dependency Analysis

  7. Performance Considerations

  8. Troubleshooting Guide

  9. Conclusion

Introduction#

This document explains the CSV file contact extraction functionality used to import phone numbers and names from CSV files for bulk messaging. It covers the automatic column detection algorithm, fallback parsing when pandas fails, phone number cleaning and validation, supported formats and column naming variations, and common parsing errors with their solutions.

Project Structure#

The CSV extraction feature is implemented in the Python backend module and integrates with the Flask API. The relevant files are organized as follows:

  • Python backend utilities for contact extraction and number validation

  • Flask API endpoints for uploading CSV files and extracting contacts

  • Manual number parsing utilities for alternative input formats

  • Dependencies required for CSV and Excel processing

graph TB subgraph "Python Backend" APP["Flask API
app.py"] EXTRACT["CSV Extraction
extract_contacts.py"] MANUAL["Manual Numbers Parser
parse_manual_numbers.py"] VALIDATE["Number Validator
validate_number.py"] REQ["Dependencies
requirements.txt"] end APP --> EXTRACT APP --> MANUAL APP --> VALIDATE EXTRACT --> REQ MANUAL --> REQ VALIDATE --> REQ

Diagram sources

Section sources

Core Components#

  • CSV extraction with automatic column detection for phone and name columns

  • Fallback manual CSV reader with UTF-8 encoding handling

  • Phone number cleaning and validation with international number formatting

  • Flask API endpoint for CSV uploads and contact extraction

  • Manual number parsing utilities for alternative input formats

Key implementation references:

Section sources

Architecture Overview#

The CSV extraction pipeline consists of:

  • A Flask route that receives CSV uploads

  • An extraction function that attempts pandas-based parsing

  • A fallback manual CSV reader for robustness

  • A shared phone number cleaning and validation routine

sequenceDiagram participant Client as "Client" participant API as "Flask API
app.py" participant Extract as "CSV Extractor
extract_contacts.py" participant Pandas as "pandas.read_csv()" participant CSVReader as "csv.reader
manual fallback" participant Cleaner as "clean_phone_number()
validation" Client->>API : "POST /upload with CSV file" API->>Extract : "extract_contacts_from_csv(file_path)" Extract->>Pandas : "Try read CSV" alt "pandas succeeds" Pandas-->>Extract : "DataFrame" Extract->>Extract : "Detect phone/name columns" loop "For each row" Extract->>Cleaner : "clean_phone_number(phone_value)" Cleaner-->>Extract : "Normalized phone or None" Extract-->>API : "Contact list" end else "pandas fails" Extract->>CSVReader : "Open file with UTF-8 encoding" CSVReader-->>Extract : "Rows" loop "For each row" Extract->>Cleaner : "clean_phone_number(row[0])" Cleaner-->>Extract : "Normalized phone or None" Extract-->>API : "Contact list" end end API-->>Client : "JSON response with contacts"

Diagram sources

Detailed Component Analysis#

Automatic Column Detection Algorithm#

The system detects phone and name columns using keyword matching:

  • Phone keywords: phone, number, mobile, cell, tel

  • Name keywords: name, contact, person

Detection logic:

  • Iterate through DataFrame columns and convert header to lowercase

  • Match headers against phone and name keyword sets

  • Select the first phone column found; otherwise default to the first column

  • Select the first name column found; otherwise default to the second column if available

flowchart TD Start(["Start CSV Extraction"]) --> LoadDF["Load CSV with pandas"] LoadDF --> DetectCols["Iterate columns and detect by keywords"] DetectCols --> HasPhone{"Phone columns found?"} HasPhone --> |Yes| UsePhone["Use first phone column"] HasPhone --> |No| UseFirst["Use first column as phone"] DetectCols --> HasName{"Name columns found?"} HasName --> |Yes| UseName["Use first name column"] HasName --> |No| UseSecondOrNone["Use second column if exists else None"] UsePhone --> IterateRows["Iterate rows"] UseName --> IterateRows UseFirst --> IterateRows UseSecondOrNone --> IterateRows IterateRows --> CleanPhone["clean_phone_number()"] CleanPhone --> Valid{"Valid phone?"} Valid --> |Yes| AddContact["Add contact with number and optional name"] Valid --> |No| NextRow["Skip row"] AddContact --> NextRow NextRow --> End(["Return contacts"])

Diagram sources

Section sources

Fallback Parsing Mechanism (Manual CSV Reader)#

When pandas fails to parse the CSV, the system falls back to a manual CSV reader:

  • Opens the file with UTF-8 encoding

  • Reads rows using csv.reader

  • Uses the first column as phone and the second column as name if present

  • Applies the same phone number cleaning routine

flowchart TD Start(["Fallback CSV Parsing"]) --> OpenFile["Open file with UTF-8 encoding"] OpenFile --> ReadHeaders["Read headers (if any)"] ReadHeaders --> LoopRows["Iterate rows"] LoopRows --> RowEmpty{"Row empty?"} RowEmpty --> |Yes| NextRow["Skip row"] RowEmpty --> |No| CleanPhone["clean_phone_number(row[0])"] CleanPhone --> Valid{"Valid phone?"} Valid --> |Yes| GetName["Get row[1] if present and non-empty"] GetName --> AddContact["Add contact with number and optional name"] Valid --> |No| NextRow AddContact --> NextRow NextRow --> End(["Return contacts"])

Diagram sources

Section sources

Phone Number Cleaning and Validation#

The cleaning routine performs the following steps:

  • Strip whitespace and handle null/NaN values

  • Remove separators (-, spaces, parentheses, dots)

  • Allow only digits and plus sign

  • Remove leading zeros if not international format

  • Prefix with plus if the number appears international-length (>10 digits) and lacks plus

  • Validate digit count (between 7 and 15 digits)

flowchart TD Start(["clean_phone_number()"]) --> NullCheck{"Is phone null/empty?"} NullCheck --> |Yes| ReturnNone["Return None"] NullCheck --> |No| Strip["Strip whitespace"] Strip --> RemoveSep["Remove separators (-, (, ), ., space)"] RemoveSep --> KeepDigits["+ and digits only"] KeepDigits --> LeadingZero{"Not international and starts with 0?"} LeadingZero --> |Yes| RemoveZero["Remove leading zero"] LeadingZero --> |No| CheckPlus{"Already starts with +?"} RemoveZero --> CheckPlus CheckPlus --> |No| LengthCheck{"Length > 10?"} CheckPlus --> |Yes| ValidateLen["Validate digit count (7-15)"] LengthCheck --> |Yes| AddPlus["Prefix with +"] LengthCheck --> |No| ValidateLen AddPlus --> ValidateLen ValidateLen --> Valid{"Valid length?"} Valid --> |Yes| ReturnClean["Return cleaned number"] Valid --> |No| ReturnNone

Diagram sources

Section sources

Supported CSV Formats and Column Naming Variations#

Supported file formats:

  • CSV: comma-separated values

  • Excel: .xlsx and .xls files

  • Text: plain text files (one contact per line)

Column naming variations recognized by the automatic detection:

  • Phone-related headers: phone, number, mobile, cell, tel

  • Name-related headers: name, contact, person

Examples of typical column headers:

  • Phone: Phone, Mobile, Cell, Tel, Number, PhoneNumber

  • Name: Name, Contact, Person, FullName, DisplayName

Note: If no matching headers are found, the system defaults to the first column for phone and the second column for name if available.

Section sources

Common Parsing Errors and Solutions#

Common issues and resolutions:

  • Empty or malformed CSV: The fallback manual parser handles basic CSV files even when pandas fails

  • Non-UTF-8 encoding: The fallback parser explicitly opens files with UTF-8 encoding

  • Missing headers: The system defaults to first and second columns when headers do not match keywords

  • Invalid phone numbers: Numbers outside the 7–15 digit range are ignored

  • Mixed separators: The cleaning routine removes separators and validates digits

Section sources

Dependency Analysis#

The CSV extraction feature relies on the following external dependencies:

  • pandas: for reading CSV and Excel files

  • openpyxl and xlrd: for Excel file support

  • csv: for manual CSV parsing fallback

  • re: for regex-based phone number detection and cleaning

  • Flask and CORS: for the API layer

graph TB EXTRACT["extract_contacts.py"] --> PANDAS["pandas"] EXTRACT --> CSV["csv"] EXTRACT --> RE["re"] APP["app.py"] --> PANDAS APP --> CSV APP --> RE APP --> FLASK["Flask"] APP --> CORS["Flask-CORS"] REQ["requirements.txt"] --> PANDAS REQ --> OPENPYXL["openpyxl"] REQ --> XLRD["xlrd"] REQ --> FLASK REQ --> CORS

Diagram sources

Section sources

Performance Considerations#

  • Prefer UTF-8 encoded files to avoid fallback parsing overhead

  • Use consistent column headers to reduce fallback logic

  • Limit file size and row count for optimal pandas performance

  • Validate phone numbers early to minimize downstream processing

Troubleshooting Guide#

  • CSV parsing fails: Ensure the file is UTF-8 encoded and has consistent separators

  • No contacts extracted: Verify column headers match phone/name keywords or accept default column selection

  • Invalid phone numbers ignored: Confirm numbers contain 7–15 digits after cleaning

  • Excel files not supported: Install required dependencies (openpyxl/xlrd) as listed in requirements

Section sources

Conclusion#

The CSV contact extraction feature provides robust automatic column detection, resilient fallback parsing, and comprehensive phone number cleaning with international formatting. By following the supported formats and naming conventions, users can reliably import contacts from CSV files for bulk messaging workflows.